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Abstract. Many of the stochastic models used in inference of phylogenetic trees from biological 
sequence data have polynomial parameterization maps. The image of such a map — the collection of 
joint distributions for a model — forms the model space. Since the parameterization is polynomial, 
the Zariski closure of the model space is an algebraic variety which is typically much larger than the 
model space, but has been usefully studied with algebraic methods. Of ultimate interest, however, is 
not the full variety, but only the model space. Here we develop complete semialgebraic descriptions 
of the model space arising from the fc-state general Markov model on a tree, with slightly restricted 
parameters. Our approach depends upon both recently-formulated analogs of Cayley's hyperdeter- 
minant, and the construction of certain quadratic forms from the joint distribution whose positive 
(semi-)definiteness encodes information about parameter values. We additionally investigate the use 
of Sturm sequences for obtaining similar results. 
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1. Introduction. Statistical inference of evolutionary relationships among or- 
ganisms from DNA sequence data is routinely performed using probabilistic models 
of sequence evolution along a tree. A site in a sequence is viewed as a 4-state (A,C,G, 
T) random variable, which undergoes state changes as it descends along the tree from 
an ancestral organism to its modern descendants. Such models exhibit a rich mathe- 
matical structure, which reflects both the combinatorial features of the tree, and the 
algebraic way in which stochastic matrices associated to edges of the tree are com- 
bined to produce a joint probability distribution describing sequences of the extant 
organisms. 

One thread in the extensive literature on such models has utilized the viewpoint 
of algebraic geometry to understand the probability distributions that may arise. 
This is natural, since the distributions are in the image of a polynomial map, and 
the image thus lies in an algebraic variety. The defining equations of this variety 
(which depend on the tree topology), are called phylogenetic invariants. That a 
probability distribution satisfies them can be taken as evidence that it arose from 
sequence evolution along the particular tree. Phylogenetic invariants and varieties 
have been extensively studied by many authors [T3ll26 l fr7 l [2T 1 [20 l [3ll34 l [8l[T2 l [30 l 
[TT| (see [7] for more references) with goals ranging from biological (improving data 
analysis) to statistical (establishing the idcntifiability of model parameters) to purely 
mathematical. 

However, it has long been understood that, in addition to the equalities of phylo- 
genetic invariants, inequalities should play a role in characterizing those distributions 
actually of interest for statistical purposes. Much of a phylogenetic variety is typically 
composed of points not arising from stochastic parameters, but rather from apply- 
ing the same polynomial parameterization map to complex parameters. Thus the 
model space — the set of probability distributions arising as the image of stochastic 
parameters on a tree — can be considerably smaller than the set of all probability 
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distributions on the variety. An instruetive recent computation |37| demonstrated 
that for the 2-state general Markov model on the 3-leaf tree, for example, the model 
space is only about 8% of the nonnegative real points on the variety. Inequalities can 
thus be crucial in determining if a probability distribution arises from a model. 

In the pioneering 1987 paper of Cavender and Felsenstein [13] polynomial equal- 
ities and inequalities are given that can test which of the 3 possible unrooted leaf- 
labeled 4-leaf trees might have produced a given probability distribution, and thus 
in principle determine evolutionary relationships between 4 organisms. Nonetheless, 
despite many advances in understanding phylogenetic invariants in the intervening 
years, little has been accomplished in finding or understanding the necessary inequal- 
ities. The potential usefulness of such inequalities, meanwhile, has been demonstrated 
in [IH] , where an inequality that holds for the 2-state model on all tree topologies plays 
a key role in studying loss of biodiversity under species extinction. In [5] a small num- 
ber of inequalities, dependent on the tree, were used to show that for certain mixture 
models trees were identifiable from probability distributions. 

Recent independent works by Smith and Zwiernik |37j and by Klaere and Lieb- 
scher [23] provided the first substantial progress on the general problem of finding 
sufficient inequalities to describe the model space. Both groups successfully formu- 
lated inequalities for the 2-state general Markov model on trees, using different view- 
points. While the 2-state model has some applicability to DNA sequences, through 
a purine/pyrimidine encoding of nucleotides, it is unfortunately not clear how to ex- 
tend these works to the more general /c-state model, or even to the particular fc = 4 
model that is directly applicable to DNA. Features of the statistical framework in [37] 
make generalizing to more states highly problematic, while the formulation of [23] 
involves beating through a thicket of algebraic details which are similarly difficult to 
generalize. 

In this work we provide a third approach to understanding the model space of the 
general Markov model on trees which has the advantage of extending from the 2-state 
to the fc-state model with little modification. Our goal is a semialgebraic description 
(given by a boolean combination of polynomial equalities and inequalities) of the set 
of probability distributions that arise on a specific tree. Such a description exists by 
the Tarski-Seidenberg Theorem |36[ 131] . since the stochastic parameter space for any 
fc-state general Markov model is a semialgebraic set, so its image under the polynomial 
parameterization map must be as well. However, we seek an explicit description, and 
this theorem does not provide a useful means of obtaining it. 

We describe below two methods for obtaining such a semialgebraic model descrip- 
tion. In one approach, that applies equally easily to all fc and all binary trees, we 
obtain inequalities using a recently-formulated analog of Cayley's 2x2x2 hyper- 
determinant from [1 , and the construction of certain quadratic forms from the joint 
distribution whose positive (semi-)definiteness encodes information about parameter 
values. We note that the appearance of the hyperdeterminant in both [37] and [23] 
motivated the work of [1] , but that our introduction of quadratic forms in this paper 
is an equally essential tool for obtaining our results. Moreover, we do not see direct 
precursors of this idea in either [37] or [23] . 

We also describe an alternative method using Sturm sequences for univariate 
polynomials to obtain inequalities. Specifically, we construct polynomials in the en- 
tries of a probability distribution whose roots are exactly a subset of the numerical 
parameters, and Sturm theory leads to inequalities stating that the roots lie in the 
interval (0, 1), as the parameters must. Although for the 2-state model this leads to 
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a complete semialgebraic description of the model on a 3-leaf tree, for higher k it be- 
comes more unwieldy. Nonetheless, this approach can produce inequalities of smaller 
degree than those found using quadratic forms, so we consider it a potentially useful 
technique. 

In both approaches, we must impose some restrictions on the set of stochastic 
parameters in order to give our semialgebraic conditions. We thus formulate a notion 
of nonsingular parameters and mostly restrict to considering them for our results. In 
the k = 2 case this notion is particularly natural from a statistical point of view, 
though it is slightly less so for higher k. Indeed, an understanding of why this notion 
is needed algebraically illuminates, we believe, the difhculties of passing from 2-state 
results to fc-state results. 

This paper is organized as follows: In !j2]we formally introduce the general Markov 
model on trees and set basic notations and terminology, including the notion of non- 
singular parameters. In we give a semialgebraic description of the general Markov 
model on the 3-leaf tree using the work of [T] and Sylvester's theorem on quadratic 
forms, a description that is made complete for the 2-state model, but holds only for 
nonsingular parameters of the fc-state model. Additionally, we discuss connections to 
several previous works on the 2-state model [29l [37l [23] . In fJH we use Sturm sequences 
to give partial semialgebraic descriptions of 3-leaf model spaces, and develop several 
examples. In Sj5l we give the main result: a semialgebraic description of the fc-state 
general Markov model on n-leaf trees for nonsingular parameters. For the 2-state 
model we prove a slightly stronger result that drops the nonsingularity assumption. 

2. Definitions and Notations. 

2.1. Tiie general Markov model on trees. We review the fc-state general 
Markov model on trees, GM(fc), whose parameters consist of a combinatorial object, 
a tree, and a collection of numerical parameters that are associated to a rooted version 
of the tree. 

Let T = {V,E) be a binary tree with leaves L V, \L\ = n, and {Xa}a£V a 
collection of discrete random variables associated to the nodes, all with state space 
[fc] = {1,2,..., fc}. Distinguish an internal node r of T to serve as its root, and direct 
all edges of T away from r. (Often this model is presented with the root as a node of 
valence 2 which is introduced by subdividing some edge. However, under very mild 
assumptions this leads to the same probability distributions we consider here |331 [3] , 
so we avoid that complication.) Though necessary for parameterizing the model, the 
choice of r will not matter in our final results, as will be shown in !j5] 

For a tree T rooted at r, numerical parameters {tt, {Me}eeE} for the GM(fc) 
model on T arc: 

(i) A root distribution row vector tt = (tti, . . . , tt^), with nonnegative entries 
summing to 1; 

(ii) Markov matrices Me, with nonnegative entries and row sums equal to 1. 
The vector tt specifies the distribution of the random variable Xr, i.e., ni = 

Prob(Xr = i), and the Markov matrices Me, for e = (ae,6e) G E, give transition 
probabilities Me{i,j) = Pioh{Xi,^ = j \ Xa^ = i) of the various state changes in 
passing from the parent vertex Ce to the child vertex be. Letting X = (Xa)agy, the 
joint probability distribution at all nodes of T is thus 

Pr0b(X = j) = TT,^ Yl MeUa^,JbJ. 

eG-E 
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By marginalizing over all variables at internal nodes of T, we obtain the joint distri- 
bution, P, of states at the leaves ofT; if k G [fc]'^' is an assignment of states to leaf 
variables, then 

P(k) = Prob(X = (k,m)) 

jne[fc]|v---L 

where (k, m) is an assignment of states to all the vertices of T compatible with k. It 
is natural to view P as an n-dimensional k x ■ ■ ■ x k array, or tensor, with one index 
for each leaf of the tree. 

For fixed T and choice of r, we use -tpT to denote the parameterization map 

IpT : {■K,{Me]e^E} ^ P- 

That the coordinate functions of Vt are polynomial is obvious, but essential to our 
work here. Note that we may naturally extend the domain of the polynomial map to 
larger sets, by dropping the nonnegativity assumptions in (i) and (ii), but retaining 
the condition that rows must sum to 1. We will consider real parameters and a real 
parameterization map, as well as complex parameters and a complex parameterization 
map. In contrast, we refer to the original probabilistic model as having stochastic 
parameters. Since the parameterization maps are all given by the same formula, we 
use '4'T to denote them all, but will always indicate the current domain of interest. 

The image of complex, real, or stochastic parameters under ipT is an n-dimensional 
kx- ■ - xk tensor, whose fc" entries sum to 1. When parameters are not stochastic, this 
tensor generally does not specify a probability distribution, as there can be negative 
or complex entries. We refer to any tensor whose entries sum to 1, regardless of 
whether the entries are complex, real, or nonnegative, as a distribution, but reserve 
the term probability distribution for a nonnegative distribution. With this language, 
the image of complex parameters under ijix is a distribution, but may or may not 
be a probability distribution. Similarly, while the matrix parameters Me have rows 
summing to one even for complex parameters, we reserve the term Markov matrix 
exclusively for the stochastic setting. 

2.2. Algebraic and semialgebraic model descriptions. Most previous al- 
gebraic analysis of the GM(fc) model has focused on the algebraic variety associated 
to it for each choice of tree T. With this viewpoint one is essentially passing from the 
parameterization of the model, as given above, to an implicit description of the image 
of the parameterization as a zero set of certain polynomial functions, traditionally 
called phylogenetic invariants |131 1261 [7] . 

Whether one considers stochastic, real, or complex parameters, the collection 
of phylogenetic invariants for GM(fc) on a tree T are the same. Thus they cannot 
distinguish probability distributions that arise from stochastic parameters from those 
arising from non-stochastic real or complex ones. To complicate matters further, there 
exist distributions that satisfy all phylogenetic invariants for the model on a given tree, 
but are not even in the image of complex parameters. Though the algebraic issues 
behind this are well understood, they prevent classical algebraic geometry from being 
a sufficient tool to focus exclusively on the distributions of statistical interest. 

To gain a more detailed understanding, we seek to refine the algebraic description 
of the model given by phylogenetic invariants into a semialgebraic description: In 
addition to finding polynomials vanishing on the image of the parameterization (or 
equivalently polynomial equalities holding at all points on the image), we also seek 
polynomial inequalities sufficient to distinguish the stochastic image precisely. 
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Recall that a subset of M" is called a semialgebraic set if it is a boolean combi- 
nation of sets each of which is defined by a single polynomial equality or inequality. 
The Tarski-Seidenberg Theorem [311 [31] implies that the image of a semialgebraic set 
under a polynomial map is also semialgebraic. 

Since for all T the stochastic parameter space of ipx is clearly semialgebraic, this 
implies that semialgebraic descriptions exist for the images of the ipT- Determining 
such descriptions explicitly is our goal. 

2.3. Nonsingular parameters, positivity, and independence. Some of our 
results will be stated with additional mild conditions placed on the allowed parameters 
for the GM(fc) model. We state these conditions here, and explore their meaning. 

Definition 2.1. A choice {tt, {Melees} of stochastic, real, or complex parame- 
ters for GM{k) on a tree T with root r is said to be nonsingular provided 

(i) at every (hidden or observed) node a, the marginal distribution Vq of Xa 
has no zero entry, and 

(ii) for every edge e, the matrix M^, is nonsingular. 
Parameters which are not nonsingular are said to be singular. 

For stochastic parameters, the first condition in this definition can be replaced 

with a simpler one: 

(i') the root distribution n has no zero entry. 

Statement (i) follows from (i') and (ii) inductively, since if all entries of are positive 

and M(^a,b) is a nonsingular Markov matrix, then the distribution = ^aM(^a.b) at a 

child b of a has positive entries. However, for complex or real parameters requirement 

(i) is not implied by (i') and (ii), as a simple example shows: Va = (1/2,1/2), and 

/ s 1 - s\ . 
M(a.b) ~ ( 2 g s 1 / singular parameters since Vf, = (1,0), even though 

has no zero entries and is a nonsingular for s =i 1. 

It is also natural to require that all numerical parameters of GM{k) on a tree 
T be strictly positive. This means that all states may occur at the root, and every 
state change is possible in passing along any edge of the tree. This assumption is 
plausible from a modeling point of view, and can be desirable for technical statistical 
issues as well. Note that positivity of parameters does not ensure nonsingularity, 
since a Markov matrix may be singular despite all its entries being greater than zero. 
Similarly, nonsingularity of parameters does not ensure positivity since a nonsingular 
Markov matrix may have zero entries. 

Given a joint probability distribution of random variables, two subsets of vari- 
ables are independent when the marginal distribution for the union of the sets is the 
product of the marginal distributions for the two sets individually. We also use this 
term, in a nonstandard way, to apply to complex or real distributions when the same 
factorization holds. 

To illustrate this usage, consider a tree T with two nodes, r, a and one edge (r, a). 
For complex parameters tt and M(r ,j), the joint distribution of and Xa is given 
by the matrix 

P = diag(7r)M(r^Q). 

Then the variables are independent exactly when P is a rank 1 matrix: P = tt^Vq. 
For k ~ 2 this occurs precisely when the parameters are singular. For k > 2, however, 
independence implies the parameters are singular, but not vice versa. In general. 
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singular parameters ensure that P has rank strictly less than fc, but not that P has 
rank 1. 

These comments easily extend to larger trees to give the following. 

Proposition 2.2. Suppose P = ipTiT^, {Me}) for a choice of complex GM(k) 
parameters on an n-leaf tree T . If the parameters are nonsingular, then there is no 
proper partition of the indices of P into independent sets. For k = 2, the converse 
also holds. 

That the converse is false for A: > 2 is a complicating factor for the generalization 
of our results from the k = 2 case. Indeed, this is the reason we ultimately restrict to 
nonsingular parameters. 

In closing this section, we note that for any P g 1tii(tPt), there is an inherent and 
well-understood source of non-uniqueness of parameters giving rise to P, sometimes 
called 'label-swapping.' Since internal nodes of T are unobservable variables, the dis- 
tribution P is computed by summing over all assignments of states to such variables. 
As a result, if the state names were permuted for such a variable, and corresponding 
changes made in numerical parameters, P is left unchanged. Thus parameters leading 
to P can be determined at most up to such permutations. 

In the case of nonsingular parameters, label-swapping is the only source of non- 
uniqueness of parameters leading to P [TS]. (See also [551 [5]). For singular parameters 
there are additional sources of non-uniqueness. 

2.4. Marginalizations, slices, group actions, and flattenings. Viewing 
probability distributions on n variables as n-dinicnsional tensors gives natural as- 
sociations between statistical notions and tensor operations. For example, summing 
tensor entries over an index, or a collection of indices, corresponds to marginalizing 
over a variable, or collection of variables. Considering only those entries with a fixed 
value of an index, or collection of indices, corresponds (after renormalization) to con- 
ditioning on an observed variable, or collection of variables. Rearranging array entries 
into a new array, with fewer dimensions but larger size, corresponds to agglomerating 
several variables into a composite one with larger state space. Here we introduce the 
necessary notation to formalize these tensor operations. 

Definition 2.3. For an n-dimensional k x ■ ■ ■ x k tensor P, integer is [n], and 
vector V = (wi, • • • jWfc), define the {n — 1)- dimensional tensor P *i v by 



where " denotes omission. 

Thus, the £th slice of P in the ith index is defined by P...i... = P , where 
is the £th standard basis vector, and the ith marginalization of P is P...+... = P *i 1 
where 1 is the vector of all Is. 

The above product of a tensor and vector extends naturally to tensors and ma- 
trices. 

Definition 2.4. For an n-dimensional k x ■ ■ ■ x k tensor P and k x k matrix 
M , define the n-dimensional tensor P *i M by 



k 




k 



(P M)(ji, . . . , j„) = ^ P(ji, . . . , . . . , j„)Af(£, J,). 
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If the above operations on a tensor by vectors or matrices are performed in dif- 
ferent indices, then they commute. This allows the use of 7i-tuple notation for the 
operation of matrices in all indices of a tensor, such as the following: 

P ■ (Ml, M2, . . . , A/„) = (• • • {{P *i Ml) *2 M2) • ■ • ) *„ M„. 

Although the Mi need not be invertible, restricting to that case gives the natural 
(right) group action of GL{k, C)" on k x ■ ■ ■ x k tensors. This generalizes the familiar 
operation on 2-dimensional tensors P, i.e., on matrices, where 

P ■ (Ml, A/a) = {P *i Ml) *2 M2 = Mf PM2. 

If V e C'^, then Diag(v) denotes the 3-dimensional kxkxk diagonal tensor whose 
only nonzero entries are the Vi in the {i, i, i) positions. That this notion is useful for 
the GM(A:) model is made clear by the observation that for a 3-leaf star tree T, rooted 
at the central node, 

iPri-rr, {Mi, A'/2, M3}) = Diag(7r) • (Mi, M2, M3). (2.1) 



If P is an n-dimensional k x ■ ■ ■ x k tensor and [n] ~ A\J B is a disjoint union 
of nonempty sets, then the flattening of P with respect to this bipartition, Flat^|g(P) 
is the fc'^' X fcl^l matrix with rows indexed by i S [fc]'"^' and columns indexed by 
j e [fc]!^l, with 

Flat^|B(P)(i,j) = P(k), 

where k G [fc]" has entries matching those of i and j, appropriately ordered. Thus 
the entries of P are simply rearranged into a matrix, in a manner consistent with the 
original tensor structure. When P specifies a joint distribution for n random variables, 
this flattening corresponds to treating the variables in A and B as two agglomerate 
variables, with state spaces the product of the state spaces of the individual variables. 

Notations such as Flati|23(-P), for example, will be used to denote the matrix 
flattening obtained from a 3-dimensional tensor using the partition of indices A = {!}, 
B = {2,3}. If e is an edge in an n-leaf tree, then e naturally induces a bipartition 
of the leaves, by removing the edge and grouping leaves according to the resulting 
connected components. A flattening for such a bipartition is denoted by Flate(P). 

Finally, we note that flattenings naturally occur in the notion of independence: 
If [71] ^ A\J B, then the sets are independent precisely when Flat^|5(P) is a rank 1 
matrix. 

3. GM(fc) on 3-leaf trees. In this section we derive a semialgebraic description 
of GM(fc) on the 3-lcaf tree, the smallest example of interest. Results for the 3-lcaf 
tree also serve as a building block for the study of the model on larger trees in ^Sl 
For this section, then, T is fixed, with leaves 1, 2, 3 and root r at the central node. 

When fc = 2, Cayley's hyperdeterminant plays a critical role, as has already been 
highlighted in [35]. Though our formulation will be different, wc take the hyperdeter- 
minant as our starting point. For any 2x2x2 tensor A ~ (aijk), the hyperdeterminant 
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A{A) m [m is 

A{A) = {a\^^a^22 + °^112«221 + «12l"212 + f^l22'^21l) ^ 2(aiiiaii2a22ia222 

+ 0111012102120222 + 0111012202110222 + 0112012102120221 + O112O122O221O21I 

+ O121O122O212O211) + 4(aiiiOi2202120221 + O112O121O211O222) ■ 

The function A has the GL{2, C)'^-invariance property 

A(i^ • (51, 52, 33)) = det(,gi)2 det(g2)' det(53)' A(P). (3.1) 

This fact, combined with a study of canonical forms for Gi(2, C)'^-orbit representa- 
tives, leads to the following theorem. 

Theorem 3.1 (dH, Theorem 7.1). A complex 2x2x2 tensor P is in the 
G1j(2,C)^ -orbit of D = Diag(l,l) if, and only if A(P) ^ 0. A real tensor is in the 
GL{2,M.f -orbit of D if and only if A{P) > 0. 

Suppose that k = 2 and P = ■i/;^ (tt, {7\ifi, M2, M3}) arises from nonsingular pa- 
rameters on T. Then, equation (|2.ip states P = Diag(7r) ■ (Afi, M2, Afs), but letting 
M[ = diag(7r)Mi we also have 

P = D-(M{,M2,M3). 

Therefore A(P) > by the forward implication of Theorem 13.11 This hyperdetermi- 
nantal inequality can thus be included in building a semialgebraic description of the 
GM(2) model when restricted to nonsingular parameters. 

However, the inequality A(P) > yields a weaker conclusion than that P arises 
from stochastic, or even real, nonsingular parameters, so additional inequalities are 
needed for a semialgebraic model description. 

Nonetheless, motivated by the role the hyperdeterminant plays in the semialge- 
braic description of the GM(2) model, in a separate work AUman, Jarvis, Rhodes, 
and Sumner [T] construct generalizations of A for k > 2. These functions are defined 

by 

/,(P;x) =det(i?,(det(P x))), 

where x is a vector of auxiliary variables, and iTx denotes the Hessian operator. They 
also have invariancc properties under GL{k, C)^ such as 

fsiP- (91,92,93);^) = det(5i)^-det(g2)''det(g3)V3(^^;53x). 

The next theorem establishes that the nonvanishing of these polynomials, in con- 
junction with the vanishing of some others, identifies the orbit of Diag(l), and thus 
is an analog of Theorem 13. II for larger k. 

Theorem 3.2 ([1]). A complex k x k x k tensor P lies in the GL{k, C)^ -orbit of 
Diag(l) if and only if for some i G {1, 2, 3}, 

(i) {P *i ej)adj(P x)(P n e^) - (P *i ef)adj(P *i x)(P *i ej) = for all 
j, £ (z [k]. Here adj denotes the classical adjoint, and equality means as a matrix of 
polynomials in x. 

(ii) fi{P;x) is not identically zero as a polynomial in x. 
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Moreover, if the enumerated conditions hold for one i, then they hold for all. 

When k > 2 the GL{k, C)'^-orbit of Diag(l) is not dense among all fc x fc x fc tensors; 
rather its closure is a lower dimensional subvariety. This explains the necessity of the 
equalities in item (i). In the case k = 2 these equalities simplify to = and thus 
hold for all tensors. One can further verify that if fc = 2 then fi = A, so that Theorem 
13.21 includes the first statement of Theorem 13.11 

One might hope that the polynomials x) had a sign property similar to that 
given in Theorem 13.11 for A(P), so that a simple test could further distinguish the 
image of nonsingular real parameters. For fc = 3, using functions related to the fi, a 
semialgebraic description of the GL{k, R)'^-orbit of Diag(l) can in fact be obtained in 
this manner (see [1]), giving a complete analog of Theorem 13.11 However, for fc > 3 
no analog is known. 

Finally, we emphasize that for fc > 2 the functions fi are not the ones usually 
referred to as hyperdeterminants |19| . but rather a different generalization of A. 

With semialgebraic conditions ensuring a tensor is in the GL{k, C)'^ orbit of 
Diag(l) in hand, we wish to supplement these to ensure it arises from nonsingular 
stochastic parameters. Wc address this in several steps; first, we give requirements 
that a tensor is the image of complex parameters under ■0^, and then that these 
parameters be nonncgative. 

Proposition 3.3. Let P be a complex k x k x k distribution. Then P is in the 
image of nonsingular complex parameters for GM(k) on the 3-leaf tree if, and only if 
P is in the GL{k,C)^ -orbit o/Diag(l) and det(P ^^1)^0 for i = 1,2,3. Moreover, 
the parameters are unique up to label swapping. 

Proof. To establish the claimed reverse implication, suppose P ~ Diag(l) • 
((7i, (72, .93) for some gi £ GL(fc,C), and let r* = g.^l denote the vector of row sums of 
gi. A computation shows that 

P *3 1 diag(r^)g2- 

Thus det(P *3 1) ^ is equivalent to the row sums of 33 being nonzero, and similarly 
for the other gi. 

Now Mi ~ diag(r*) gi is a complex matrix with row sums equal to one. Letting 
TT = (ni=i '"ii • • • J I\i=i ^k) be the vector of entry- wise products of the r\ the entries 
of TT are nonzero and 

P = Diag(7r)-(Mi,Af2,M3)- 

Since P is a distribution, 

l = ((P*ll)*2l)*3l 

= (((Diag(7r) • (A/i, M2, M3)) *i 1) *2 1) *3 1 
= ((Diag(7r) *i Afil) *2 M2I) *3 M3I 
= ((Diag(7r) *i 1) *2 1) *3 1 
= 7rl, 

so TT is a valid complex root distribution. Thus, P is in the image of TpT for complex, 
nonsingular parameters. 

The forward implication in the theorem is straightforward. 

The uniqueness of nonsingular parameters up to permutation of states at the 
internal node of the tree was discussed at the end of subsection 12.31 □ 
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Combining this Proposition with Theorems 13.11 and 13.21 we obtain the following. 

Corollary 3.4. A k x k x k complex distribution P is the image of complex, 
nonsingular parameters for GM(k ) on the 3-leaf tree if, and only if, it satisfies the 
semialgebraic conditions (i) and (ii) of Theorem \3.2\ and 

(Hi) for i 1, 2, 3, dct(P *, 1) ^ 0. 

For k ~ 2, P is the image of real nonsingular parameters for GM(2 ) on the 3-leaf 
tree if, and only if, it satisfies A(P) > and the semialgebraic conditions (Hi). 

Next we characterize the image of nonsingular stochastic parameters, and finally 
of strictly positive nonsingular parameters. The key to this step is the construction 
of certain quadratic forms whose positive semi-definiteness (respectively definiteness) 
encodes nonnegativity (respectively positivity) of some of the numerical parameters. 
Sylvester's Theorem [35], which we state for reference here, then gives a semialgebraic 
version of these conditions. 

Recall that a principal minor of a matrix is the determinant of a submatrix chosen 
with the same row and column indices, and that a leading principal minor is one of 
these where the chosen indices are {1, 2, 3, . . . , fc} for any k. 

Theorem 3.5 (Sylvester's Theorem). Let A be an n x n real symmetric matrix 
and Q(x) = X"^^x the associated quadratic form on M". Then 

1. Q is positive semidefinite if, and only if, all principal minors of A are non- 
negative, and 

2. Q is positive definite if, and only if, all leading principal minors of A are 
strictly positive. 

We use Sylvester's Theorem to establish the following theorem. 

Theorem 3.6. A k x k x k tensor P is the image of nonsingular stochastic 
parameters for the GM(k ) model on the 3-leaf tree if, and only if, its entries are 
nonnegative and sum to \, conditions (i), (ii), and (Hi) of Theorem \3.Si and Corollary 
\3.4\ are satisfied, and 

(iv) all leading principal minors of 

det(P..+)Pj..adj(P..+)P.+., (3.2) 

are positive, and all principal minors of the following matrices are nonnegative: 

det{P..+)Pladi{P.+)P+., fori = l,...,k, (3.3) 
det(P.+)Pj..adj(P..+)P.i., /or i = 1, . . . , fc, 
det(P+..)P+.adj(P+..)P.^„ /or z = 1, . . . , fc. 

Moreover, P is the image of nonsingular positive parameters if, and only if, its 
entries are positive and sum to 1, conditions (i), (ii), and (Hi) are satisfied and 
(iv') all leading principal minors of the matrices in (|3.2p and (|3.3p are positive. 
In both of these cases, the nonsingular parameters are unique up to label swap- 
ping. 

Proof. Let P be an arbitrary nonnegative kxkxk tensor whose entries sum to 1. 
By CoroUarv 13.41 the first 3 conditions are equivalent to P = tpri'^, {Mi, M2, M3}) 
for complex nonsingular parameters. We need to show the addition of assumption 
(iv) is equivalent to parameters being nonnegative. 
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Note that 



P..+ 
P.+. 



P*i 1 



P*3 1 



P*2 1 



A/f diag(7r)M2, 
diag(7r)M3, 
A/J diag(7r)Af3. 



Since P..+ is nonsingular, we find 



P^.P.T+P. 



AI^ diag(7r)Af3 



(3.4) 



is symmetric, and the matrix of a positive definite quadratic form if, and only if the 
entries of tt are positive. Equivalently, by Sylvester's theorem, all leading principal 
minors of this matrix must be positive. 
Similarly, using slices one has 



where Ai^i = diag(Afie,;) is the diagonal matrix with entries from the ith column of 
Ml . Thus its principal minors being nonnegative is equivalent (since we already have 
that the entries of tt arc positive) by Sylvester's Theorem to the entries in the ith 
column of Mi being nonnegative. The product Pj..P^^P.i. similarly can be used for 
a condition that the ith column of M2 be nonnegative, and the product P.+.P^^P^^ 
for the columns of M3 . 

Multiplying all these matrices by the square of an appropriate nonzero determi- 
nant clears denominators and preserves signs, yielding (|3.2p and (|3.3p . 

For the second statement, that the matrices in (|3.2[) and p.3p have positive leading 
principal minors is equivalent, by Sylvester's Theorem, to the positive dcfiniteness of 
the quadratic forms, which in turn is equivalent to the positiveness of parameters. 
Since these parameters are nonsingular, the only source of non-uniqueness is label 
swapping. □ 

Remark. Matrix products such as that of equation p.4p appeared in [3] , where 
their symmetry was used to produce phylogenetic invariants, but their usefulness for 
stating nonnegativity of parameters was overlooked. 

Remark. The j x j minors of the matrices in (|3.2p and p. 31) are polynomials 
in the entries of P of degree j{2k + 1), with j = 1, . . . , fc. However, as the leading 
determinant in those products is real and nonzero, one can remove an even power of 
it without affecting the sign of the minors. Thus the polynomial inequality of degree 
j{2k + 1) can be replaced by one of lower degree, j{k + 1) + ejk, where ej = or 1 is 
the parity of j. 

In the case of the 2-statc model, the above result can be made more complete, 
by also explicitly describing the image of singular parameters. While semialgebraic 
characterizations of probability distributions for both nonsingular and singular pa- 
rameters on the 3-leaf tree have been given previously by [32] , [10] , [23] , and [37] , we 
provide another since our approach is novel. 

Theorem 3.7. A tensor P is in the image of the stochastic parameterization map 
ipT for the GM(2) model on the 3-leaf tree if, and only if, its entries are nonnegative 
and sum to 1, and one of the following occur: 

1. A(P) > 0, det(P *i 1) 7^ for i ~ 1,2, 3, all leading principal minors of 



Plp-lP+. = A/3^ diag(7r)Ai,,Af3 



det(P..+)Pj.. adj(P..+)P.+. 
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are positive, and all principal minors of the following six matrices are nonnegative: 

det(P..+) Pi adj(P..+)P.+., for i = 1, 2, 
det(P..+)Pj.. adj(P.+)P,., fori^ 1,2, 
det(P+..)P+. adj(P+..)P^„ for i = 1, 2. 

In this case, P is the image of unique (up to label swapping) nonsingular parameters. 

2. A(P) = 0, and all 2 x 2 minors of at least one of the matrices Flati|23(P), 
Flat2|i3(P), Fla't3|i2(P) are zero. In this case, P arises from singular parameters. If 
P has all positive entries, then it is the image of infinitely many singular stochastic 
parameter choices. 

Proof. Using Theorem 13.61 and the observations made for k = 2 immediately 
foUowing the statement of Theorem l3.2[ case 1 is aheady estabhshed under the weaker 
condition that A(P) ^ 0. However, since the parameters are nonsingular and real 
when A(P) ^ and the conditions of case 1 are satisfied, by Theorem 13.11 we may 
assume equivalently that A(P) > 0. 

To establish case 2, first assume P = il^ri'^, {-^^i, M^}) is the image of singular 
stochastic parameters. Then certainly P has nonnegative entries summing to 1, and 
by equations ((TT|) and ([XT]) . A(P) = 0. Since 

Flati|23(P)-A/f diag(7r)Af, 

where M is the 2x4 matrix obtained by taking the tensor product of corresponding 
rows of M2 and M^, this flattening has rank 1, if tt has a zero entry or Mi has rank 
1. Similar products for the other flattcnings show that singular parameters imply at 
least one of the flattenings Flati|23(P), Flat2|i3(P), Flat3|i2(P) has rank 1, and hence 
its 2 X 2 minors vanish. 

Conversely, suppose A(P) = and at least one of the flattcnings has vanishing 
2x2 minors, and hence rank 1. Then by the classification of orbits given in [16[ 
Table 7.1], P is in the GI/(2, M)'^-orbit of one of the following four tensors: the tensor 
Diag(l, 0) (in which case all three flattenings have rank 1) or one of the 3 tensors with 
parallel slices / and the zero matrix (in which case exactly one of the flattenings has 
rank 1). 

If P = Diag(l,0) • (31,32,33), then P{i,j,k) = 31(1, 1)32(1, j)53(l, fc)- Since the 
entries of P are nonnegative and sum to 1, one sees the top rows of each gi can 
be chosen to be nonnegative, summing to 1. The bottom row of each gt can also 
be replaced with any nonnegative row summing to 1 that is independent of the top 
row. Taking tt = (1,0), this gives us infinitely many choices of singular stochastic 
parameters giving rise to P. Alternatively, one could choose each Markov matrix 
to have two identical rows, and any tt with nonzero entries to obtain other singular 
stochastic parameters leading to P. 

For the remaining cases assume, without loss of generality, that P = i?- (31, 32, 33), 
where E..i = (1/2)/, and E..2 is the zero matrix. Then P.i = 33(1, l)(3i"32) and 
P-2 = 33(1, 2)(3f 32). Since the entries of P are nonnegative and add to 1, we may 
assume that the top row of 33 is also nonnegative and adds to 1. Choose M3 to 
have two identical rows matching the top row of 33. Now P..-|- = 3^32 is a rank-2 
nonnegative matrix with entries adding to 1. Such a matrix can be written in the form 
P..+ = il/f diag(7r)M2 with, for instance Mi = I, tv = P.++, M2 = diag(7r)"ip..+. 
Then one has P ~ i1^t{t^, {Mi, M2,Ms}). If P has positive entries one may also choose 
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Ml sufficiently close to / so that Mi, tt = (Aff )-ip.++, A/a = diag(7r)-i(A/f ) ^P..+ 
all have nonnegative entries, thus obtaining infinitely many singular parameter choices 
leading to P. (The example oi P — E shows that with only nonnegative entries there 
may be only finitely many singular parameter choices leading to P.) □ 

Remark. The analysis of the singular parameter case in this proof, by appealing 
without explanation to [16| Table 7.1], has not made explicit the importance of the 
notion of tensor rank. Indeed, that concept is central to both |16) and [1] and thus 
plays a crucial behind-the-scenes role in this work as well. The first singular case, a 
tensor in the orbit of Diag(l, 0), is of tensor rank 1, while the second, a tensor in the 
orbit of -E, is of tensor rank 2 yet multlilinear rank (2,2,1). The nonsingular case is 
those of tensor rank 2 and multlinear rank (2, 2, 2). 

Remark. In case [T] of the theorem, the polynomial inequalities are of degree 4 
and 2 (from A and the determinants) and degree 5 and 10 (from the minors), but 
the degree 10 ones can be lowered to degree 6 by removing a factor of a determinant 
squared. The polynomial equalities appearing in case [5] have degree 4 and 2, with 
the quadratics simply expressing that one of the leaf variables is independent of the 
others. 

Minor modifications to the argument give the extension to positive parameters 
below. 

Theorem 3.8. A tensor P is in the image of the positive stochastic parame- 
terization map for the GM(2) model on the 3-leaf tree if, and only if, its entries are 
positive and sum to 1, and the conditions of Theorem \3.7\ are met with the following 
modification to case 1: all leading principal minors of the matrices are positive. 
Proof. Case 1 is proved by combining the arguments for Theorems 13.61 and 13.71 
For case 2, if P is in the orbit of Diag(l,0) the argument of Theorem 13.71 still 
applies, replacing 'nonnegative' with 'positive', and using the second construction of 
singular parameters. If P is in the orbit of E we simply replace 'nonnegative' with 
'positive' in the argument. □ 

As mentioned above, semialgebraic descriptions of the binary general Markov 
model on trees have been given previously, but in ways where generalizations to k- 
state models were not apparent. Although we have considered only the 3-leaf tree 
(with larger trees to be discussed in fJS]) thus far, we pause here to discuss some 
connections to the recent works of Zwiernik and Smith [37] and Klaere and Liebscher 
P5] on semialgebraic descriptions of the binary model on trees, as well as the older 
work of Pearl and Tarsi [53] . 

Though using different approaches (and in the case of [29] with different goals), 
all three of these works emphasize statistical interpretations of various quantities 
computed from a probability distribution P {e.g., covariances, conditional covariances, 
moments, tree cumulants). While analogs of some of the same quantities appear in 
our generalization to fc-state models, we have used algebra, rather than statistics, to 
guide our derivation. Although an inequality such as det(P *i 1) 7^ which appears 
in our description can be given a simple statistical interpretation when k ~ 2 (that 
two leaf variables are not independent), for larger k its meaning is more subtle, as it 
is tied to our notion of nonsingular parameters. Thus our generalization to fc-state 
models uses a more detailed development than the simple generalization of statistical 
concepts from fc = 2 to larger k. 

The role played by the hyperdeterminant A in giving a semialgebraic model de- 
scription for fc = 2 was first made clear in [37]. Its role was essentially independently 
discovered in [531 Theorem 6], though without recognition that it is a classical al- 
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gcbraic object. Indeed, both works recognize A as an expression in the 2-variable 
and 3-variable covariances (i.e., central moments). This is a fascinating intertwining 
of algebra and statistics, yet we did not find it helpful in understanding the correct 
analog of A for higher k; rather, [1] develops the analog fi used here through algebraic 
motivation entirely. It would nonetheless be interesting to understand whether fi can 
be described in more statistical language. 

The explicit semialgebraic model descriptions for the 2-state model given in both 
[37] and }23| take on quite different forms than ours. This is not a surprise as such a 
description is far from unique, and different reasoning may produce different inequali- 
ties. The version in |37| . for example, is stated in a different coordinate system, using 
tree cumulants rather than the entries of P. We find all these descriptions valuable, as 
what should be considered the simplest, or most natural, description is not obvious. 

The focus of is on recovery of parameters for GM(2) on the 3-leaf tree from 
a probability distribution assumed to have arisen from stochastic parameters, in an 
approach based on earlier work in latent structure analysis |27| . In addressing this 
question, however, semialgebraic conditions on the distribution are obtained. For 
instance, the non-vanishing of denominators is needed for formulas to make sense, 
and thus certain polynomials must be nonzero. (The authors seem to assume nonsin- 
gularity of parameters, though that is never clarified in the paper.) While A never 
arises in [29|, it is remarkable to note, then, that using the results of Theorem 13 . 1 1 and 
making explicit the tacit assumptions that various rational expressions exist and are 
real, the conditions given in Theorem 1 of [J^ are sufficient to show A(P) > 0, and 
thus P is in the image of stochastic nonsingular parameters. Thus one can extract a 
semialgebraic model description from this work, even if that was not its goal. 

4. Inequalities for 3-leaf trees via Sturm theory. The semialgebraic model 
descriptions given in the previous section have the advantage of being easily describ- 
able in a uniform way for all k. However, semialgebraic descriptions are not unique, 
and there is no clear notion of what description should be considered simplest. It is 
also of interest, therefore, to obtain alternative polynomial inequalities, possibly of 
lower degree, that must also be satisfied by probability distributions in the model, in 
the hopes that they lead to another, perhaps better, semialgebraic model description, 
or that they might be of further use for testing whether a distribution arises from the 
model. We explore another approach to doing so here. 

4.1. Review of Sturm Theory. Sturm theory can be used to impose conditions 
that roots of a univariate polynomial lie in a certain interval. We briefly recall basic 
definitions and results. Suppose that f{x) € K[a:] is a non-constant polynomial of 
degree m, with no multiple roots, and we wish to count the roots of / in an interval 
(a, b) where f{a)f(b) ^ 0. Then a Sturm sequence S for / on the interval [a, b] is a 
sequence / = /o, /i, • ■ • , /m of polynomials satisfying certain sign relationships at the 
zeros of the fj in the interval [a, b]. We give an example below, but for specifics, see, 
for instance, |22| . For any c G [a,b] which is not a root of any fi, the sign variation 
Vs{c) is the number of sign changes in the sequence /(c), /i(c), . . . , fm{c). 

Theorem 4.1 (Sturm's Theorem). Let f(x) G M[x] be a non- constant polynomial 
and S a Sturm sequence for f on [a, b]. Then the number of distinct roots of f{x) in 
{a,b) is equal to Vs{a) — Vs{b). 

Though other constructions of Sturm sequences exist, we use the standard se- 
quence, derived using a modified Euclidean algorithm. If / = /o is a polynomial of 
degree to > 0, then set /i — /', and for j = 2, . . . , m, take fj to be the opposite of 
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the remainder of division of /j-2 by fj-i- This yields a Sturm sequence for / on any 
interval [a,b] with fj{a) ^ 0, fj{b) ^ 0. 

To illustrate, suppose that j(x) = + c\x + cq G ]R[a;]. Then the standard 

<? . . . r, 

sequence is /, 2x + ci , ^ — cq . For the particular choice of coefficients ci = — | 
and Co — |, using this sequence on [0,1], we calculate that /o(0), /i(0), /2(0) is the 
sequence ^, and /o(l), /i(l), /2(1) = |, |, ^. Thus, ^5(0) = 2, Ks(l) = 0, so 

/ has 1/5(0) — 1/5(1) = 2 roots in (0,1). Indeed, the factorization /(x) = (a; — |;) 
shows this directly. 

Two comments are in order: First, in this example observe that /2(x) is one- 
fourth the discriminant of /(x), and its positivity ensures that a monic quadratic has 
distinct and real roots. This shows that Sturm theory can produce familiar algebraic 
expressions, such as the quadratic discriminant, and thus gives a tool for generalizing 
them. The second observation we state more formally, as it is needed in our arguments 
below. 

Corollary 4.2. // j(x) is of degree m with neither nor 1 a root, and S is a 
Sturm sequence for f on [0,1], then f has m distinct roots in this interval precisely 
when 1/5(0) = m and 1/5(1) = 0. 

This corollary allows us to obtain inequalities ensuring a polynomial has distinct 
roots in [0, 1]: One simply requires that the /i(0) alternate in being > or < 0, while 
the /i(l) either be all > or all < 0. We informally refer to inequalities obtained in 
this manner as Sturm sequence inequalities. 

4.2. Eigenvalues and Sturm Sequences. We now give a second construction 
of inequalities that, if satisfied, ensure that GM(fc) model parameters on a 3-leaf tree 
are stochastic. While in SJ3] we constructed matrices encoding positivity of parame- 
ters through requirements on associated quadratic forms, here we instead construct 
matrices whose eigenvalues encode parameters. 

Suppose that P = ipTiT^, {Mi, AI2, M3}) for complex nonsingular {tt, A/1A/2, M^}. 
Recall that for i G [k], 

P,.. =P*ie, = diag(7r)Ai,, M3, 
P+.. =P*il = MJ diag(7r)A/3 

where Ai_i = diag(A/iej;) is the diagonal matrix with entries from the ith column of 
Ml. Thus, by nonsingularity of the parameters, 

Ai,, := P+^P.,.. = A/3-iAi,,Af3 

has as eigenvalues the entries of the iih column of Mi. (This construction underlies 
the proof of parameter identifiability for the GM(fc) model on trees [15], and the 
construction of phylogenetic invariants in [3], including the equality in condition (i) 
of Theorem 13.21 of this paper.) Similarly we define the matrices 

A2,^ ■.= P.-+'P^.=M^'A2,^M3, 

A3,^ :=P.7+ip.., = A/2-1 A3,, A/2 

with Aj_i = diag(A/je,;) the diagonal matrix with entries from the ith column of the 
matrix Mj. 

Proposition 4.3. Let P ~ iJ.'t{tt, {Mi, M2, M3}) be a real k x k x k tensor 
that is the image of complex nonsingular parameters. For each of the matrices Ajj, 
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j G {1,2,3}, i £ [k], assume the characteristic polynomial has neither nor 1 as a 
root, and let 5^ j denote the standard Sturm sequence for it. Then Vg^ .{0) — k and 
^Sj i(l) = for all i,j if and only if the matrices Mi, M2, -M3 are positive Markov 
matrices with no repeated element in any column and tt is real. 

Proof. The statement about the Mi follows from Corollary 14.21 Then, since the 
Mi are real and invertible 



Each matrix Aj,i has entries that are rational in the entries of P, with denominator 
det(P *j 1) of degree k and numerator of degree k. The characteristic polynomial / 
of Aj,i thus also has coefRcients that are rational functions in P, and in fact the non- 
leading coefficients Ci of / are rational of degree k over k. This can be seen explicitly, 
for example when j = I, in the following: 



It follows that the Sturm sequence inequalities, which are constructed from the 
coefficients c^, are rational in the entries of P as well. Indeed, by multiplying each of 
these inequalities by a sufficiently high even power of det(P *j 1) to avoid changing 
signs, these expressions become polynomial in P. Thus, one can phrase the conditions 
Vs i{0) = k and V5. ^(1) = as a collection of polynomial inequalities. Finally, note 
that since / is a monic characteristic polynomial, then (— l)'^/o(0) = det{Aj^i) > 
and /fe(0) = /fc(l), so the signs of all /j(0) and /j(l) are determined. This leads to 
the following: 

Corollary 4.4. Consider real k x k x k tensors P ~ ipri''^, {Mi, M2, M3}) 
arising from complex nonsingular parameters. Then one can give a finite collection 
of strict polynomial inequalities that hold precisely when the Mi are positive Markov 
matrices with no repeated column entries, and tt is real. 

Remark. For simplicity, we have restricted our discussion to polynomials with no 
multiple roots, leading to the constraints on the matrix column entries given above. 
However, this restriction can be removed, by considering Sturm sequences for poly- 
nomials with repeated roots. We suggest [24j [28] for futlicr information. 

Note that the non-strict versions of the inequalities of Corollarv l4.4l must continue 
to hold on the closure of the image of parameters described in the corollary. As 
this closure includes the image of Markov matrices which may have repeated column 
entries, or entries of or 1, or are singular, the non-strict inequalities hold for all 
stochastic parameters. 

However, some distributions arising from non-stochastic parameters may satisfy 
the non-strict inequalities as well. Thus while we have obtained semialgebraic state- 
ments guaranteeing stochasticity of nonsingular parameters of a particular form, this 
does not seem to lead to a complete semialgebraic description of the image of all 
stochastic parameters for arbitrary k. In the case k ~ 2, however, we can do better, 
as we show next. 



((P.(Mfi,M2-i,M3-i))*il)*2l 



shows TT is real. 



□ 



fix)=detixI-P^}P,..) 

= dct(.TF+.ip+.. -P+.ip,..) 
= det {P^^{xP+..-P,..)) 




(4.1) 



det(P+..) 
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4.3. Sturm sequences for GM(2). In the specific case of tlie 2-state model, 
we explicitly give the inequalities of Corollary 14.41 To this end, suppose that P = 
ipri''^, {Mi,M2, M3}) is a 2 X 2 X 2 probability distribution arising from nonsingular 
parameters. 

For any j £ {1,2,3}, i G [2], let A ~ Aj ^. The standard Sturm sequence of the 
characteristic polynomial of A is 

fo^f^ - tT{A)t + det{A), 
/i = 2t-tr(A), 

/2 = ^(tr(Af -4det(A)) = i5(/o), 

where 6{f) denotes the discriminant of the monic quadratic polynomial /. 

Since k ~ 2, the nonsingularity of the Mi together with the fact that their rows 
sum to 1 implies that none of their columns have repeated entries. Thus the char- 
acteristic polynomial fa must have distinct roots. To ensure that these roots are in 
(0, 1), so that the matrix parameters are Markov, the variation in signs of the Sturm 
sequence must be: 

/o(0) = det(A) > 0, /o(l) = 1 - tr(^) + det(A) > 0, 

/i(0) = -tr(A)<0, /i(l)=2-tr(A) >0, (4.2) 

./2(0)= \si.f)>0, /2(1) - i<5(/) > 0. 

Each of these inequalities can be expressed using rational expressions in the entries 
of P. For instance, for j = 1, i = 1, the first two inequalities ensuring that the first 
column of Mi has distinct entries between and 1 are 

and 

„ .„^ . , -2det(Pi..) +P112P221 -P211P122 -P111P222 +P212P121 „ 
/i(0) - -tr(A) _ ^-^^ < 0. 

After multiplying each of the inequalities of (|4.2p by dei{P+..)^ to clear denominators, 
we obtain five distinct polynomial inequalities of degree 4 in the entries of P, as well 
as one degree-2 inequality 

det(P+..) ^ 0. 

Note too that /2(0) — f-ii^) is a positive multiple of the discriminant of fo, and its 
positivity guarantees (again) that the roots of fo are distinct and real. 

The inequality (|4.3p has a direct statistical interpretation: Assuming the states 
of the variables Xi are encoded with numerical values s and s + 1, then det(P_|_..) = 
Cov{X2, X3) and det(Pi..) is a positive multiple of Cov{X2, X3 \ Xi = 1). Thus the 
inequality states that the sign of the association of X2 and X3 is the same whether 
we have information about Xi or not. Viewing the 3- leaf tree as a graphical model 
for nonsingular parameters, this should be expected, but that it arises cleanly from 
Sturm theory is a pleasant surprise. 
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Of additional interest is the observation that /2 can be expressed in terms of the 
hyperdetcrminant A(P): 



4 4det(P+..)2' 

Since A(P) ^ imphes det(P+..) ^ 0, we find 

/2 > if, and only if, A(P) > 0. (4.4) 

In particular, using Theorem 13.11 we see the Sturm inequality involving /2 implies 
that P arises from nonsingular real parameters, and thus an additional assumption 
of that fact is not needed to supplement the Sturm inequalities. 

We further note that the inequalities (|4.2p for various Aj^i are not independent 
of one another. Since Aj^i + Aj^2 = I, it follows that the two matrices Aj^i, Aj^2 give 
rise to the same inequalities. 

The inequalities (|4.2p . unfortunately, are not sufficient to ensure the root distri- 
bution TT is also positive. For instance. 



P = 



0.65439 0.07191 
0.07191 0.00079 



0.16361 0.01809 
0.01809 0.00121 



is a probability distribution that satisfies A(P) > and the Sturm inequalities for 
each Aj^i, but the root distribution tt = (1-01, —0.01) is not stochastic. 

Nonetheless, in the 2-state case we can construct another inequality in P that en- 
sures the root distribution is positive. If P G Im('!/'T) for nonsingular real parameters, 
then 

det(P+..) det(P.+.) dct(P..+) 
A{P) = ™- 

This is easily verified using transformation properties of the determinants and A under 
the action of (Mi,M2,M3) on Diag(7r). (See [51 p. 136] for an earlier derivation and 
application of this equation.) Since tti -I- 7r2 = 1, the positivity of the tt^ is equivalent 
to 

det(P+..)det(P+.)det(P.+) 1 
A(P) 4' 

Moreover, because A(P) > 0, this in turn is equivalent to the inequalities 

< dct(P+..) dct(P.+.) det(P..+) < ^A(P). (4.5) 

Although the second inequality here is not homogeneous, it can be made homogeneous 
of degree 6 by multiphcation of the right side by 1 = (J^^ j k=i PijkY ■ 

Putting this all together, we have an alternative semialgebraic test, to be con- 
trasted with case 1 of Theorem 13. 8[ for testing that tt is stochastic. 

Proposition 4.5. The image of the positive nonsingular parameterization map 
for GM(2) on a 3-leaf tree can be characterized as the probability distributions satis- 
fying an explicit collection of strict polynomial inequalities: 3 of degree 2, 13 of degree 
4, and 2 of degree 6. 
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Proof. By the above discussion, the degree-2 inequalities dct(P *i 1) 7^ and the 
5 degree-4 inequalities arising from (|4.2p for each of the 3 choices of j suffice to ensure 
that nonsingular parameters exist, with Markov matrices having positive entries. Only 
13 of these dcgrce-4 inequalities are distinct, as each j leads to A(P) > 0. Then the 
degree-6 inequalities of (|4.5|) ensure tt has positive entries. □ 

Note that case 1 of Theorem 13.81 gave a description using 3 degree-2. 1 degree-4, 
4 degree-5, and 4 degrec-6 polynomials. The description arising from Sturm theory 
thus uses fewer degree-5 and 6 polynomials, but more degree-4 ones. 

4.4. Sturm sequences for GM(3). We now give several examples of Sturm 
sequence inequalities for GM(3) on the 3-taxon tree. 

If ^ is a 3 X 3 matrix with positive determinant and characteristic polynomial 
f{x) = + C2x'^ + cix + Co without roots at or 1, then A has 3 distinct eigenvalues 
in the interval (0,1) if, and only if, 

/o(0)=co <0, ./o(l) = I + C2 + C1+C0 >0, 

/i(0)=ci >0, /i(l) = 3 + 2C2 + C1 >0, 

/2(0) = -co + iciC2 <0, /2(l) = -^Ci + ^c2-co + iciC2 > 0, (4.6) 



m-',^^ >o, /3(i)=^J^ 



where S{f) denotes the discriminant of /. Here, of course, cq = — dct{A), C2 ~ — tr(A) 
and Cl is quadratic in the entries of A. However, by (j4.ip . ii A = Aj^i then each Ci is 
rational in the entries of P, with numerator of degree 3 and denominator det(P *j 1). 

By multiplying the top 6 inequalities of (|4.6p by det(P *j 1)^, we obtain six 
polynomial inequalities of degree 6 in P. In the case that A = = P+.^Pi.., for 
example, the first inequality is 

< -det(P+..)2/o(0) = dct(P+..)dct(Pi..). 

By (|4.ip we know that det(P+..)^ci and det(P+..)^C2 are polynomials of the form 
det{P+..)K , where K is homogeneous of degree 3 in the entries of P. Computations 
with Maple show K has 42 monomial summands for ci, and 114 monomial summands 
for C2. 

The inequality of the bottom row of (|4.6p can be simplified to d{f) > 0, which is 
of degree 4 in the q. Thus, dct(P+..)'*(5(/) > is a polynomial inequality of degree 
12 in P. We omit writing these Sturm inequalities explicitly. 

Instead, we illustrate a more direct application of the relevent Sturm theory, in 
which the semialgebraic description of the model is present only implicitly. Consider 
the probability distribution with exact rational entries given by 



P 



0.1500 0.0130 0.1053 
0.0130 0.0050 0.0153 
0.1053 0.0153 0.0776 



0.0130 0.0050 0.0153 
0.0050 0.0090 0.0093 
0.0153 0.0093 0.0186 



0.1053 0.0153 0.0776 
0.0153 0-0093 0.0186 
0.0776 0.0186 0.0620 



(4.7) 



One can check that P satisfies the conditions of Theorem l3.2[ so P is in the image 
of nonsingular complex parameters. Then the values of the Sturm sequence at x = 
and X = 1 for the characteristic polynomial of Ai^i are approximately as in Table l4Tl 
Thus the sign variations arc Vs{0) = 2 and Vs{l) ~ 1, so the first column of Mi has 
exactly 1 distinct real entry. This then implies that either Mi is not real, or that its 
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first column contains tlie same entry in all rows. Since the second possibility implies 
that a 3 X 3 Markov matrix is singular, wc can conclude that P does not arise from 
stochastic parameters. 





fo{x) 




/2(^) 


/3(X) 


a; = 


-0.1087 


0.7225 


-0.0117 


-0.0283 


X ^ 1 


0.1138 


0.7225 


0.0067 


-0.0283 



Table 4.1 



Sturm sequence values for the characteristics polynomial of Ai.i for the tensor P of 114.71 1 

This example did not actually require the full strength of Sturm's theorem; it 
is sufficient to note that the discriminant of the cubic characteristic polynomial is 
negative to conclude that the first column of Mi has two complex entries, and one 
real one. This is special to the small size of the state space, however. For the case of 
most interest in phylogenetics, k ~ A, the sign of the quartic discriminant alone does 
not carry enough information to determine whether all roots of a polynomial are real. 
Moreover, even for k ~ 3, the Sturm sequence is needed to ensure roots are between 
and 1. 

One might optimistically hope that some fc = 3 analog of the hyperdeterminant, 
such as those in [T|, might arise easily from Sturm theory, as the hyperdeterminant 
itself did in the case k = 2. Unfortunately this does not appear to be the case, at 
least by the most straightforward considerations. 

In closing, we note that for A; > 3 we have not given in this section any condition 
to ensure that tt has positive entries. For A: = 2 we did do so, using the transformation 
formula for A(P), but this idea does not seem to generalize. The only way we know to 
obtain a semialgebraic condition ensuring this is through the quadratic form approach 
used in §3] 

5. GM(fc) on 71-leaf trees. We now extend the results of the previous sections 
to n-leaf trees, for n > 3. To vary the choice of the root node of the tree in our 
arguments, we need the following. Similar lemmas are given in j331 Theorem 2] and 
[3J Proposition 1]. 

Lemma 5.1. Suppose stochastic parameters are given for the GM(k) model on 
a tree T with the root located at a specific node of T . Then there are stochastic 
parameters for T rooted at any other node of T , or at a node of valence 2 introduced 
along an edge of T , which lead to the same distribution. Moreover, if the original 
parameters were nonsingular and/or positive, then so are the new ones. 

Proof. It is enough to show this on a tree with a single edge, as one may then 
successively apply that result along the edges in a path in a larger tree. 

We show first that the root may be moved from one vertex of an edge to the 
other. For this it is sufficient to show that given any probability distribution tt and 
Markov matrix M, there exists a probability distribution tt and Markov matrix M 
with diag(7r)M = P = M'^ diag(7r). This is straightforward if the column sums of P 
are nonzero. If a column sum of P is zero, and hence all entries in the column are 
zero, then the corresponding entry of tt must be zero while that row of M can be 
arbitrary. If the original parameters were nonsingular or positive, then showing that 
the new ones are as well is straightforward. 

If instead wc wish to move the root from vertex a on edge (a, h) to a new internal 
node r introduced to subdivide the edge, first introduce r and let M(^a,r) = M(^a.b) 
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and Mf^^ i,^ = I. Then move the root from a to r as above. For the case of positive 
parameters, instead pick M(^r,b) to have positive entries but be near enough to / that 
M(^a,r) = -^'^(a,6)^^(r 6) positivc entries. □ 

Note that for real or complex parameters Lemma l5. II fails to hold as the examples 
TT = (1/2,1/2), ~ ^2 ^ l) ' ^ "'^ show. (The problem is simply that a 

column sum of diag(7r)M can be zero though the column is not the zero vector.) 
However, if the parameters are nonsingular, we can still move the root by modi- 
fying the aboye^ argument. Indeed, nonsingularity of pararneters implies that from 
diag(7r)Af = Af^ diag(7r) one can solve for a nonsingular M, since the other three 
matrices in the equation are nonsingular. This shows the following. 

Lemma 5.2. Suppose real or complex nonsingular parameters are given for the 
GM(k) model on a tree T with the root located at a specific node ofT. Then there are 
nonsingular parameters for T rooted at any other node of T , or at a node of valence 
2 introduced along an edge of T, which lead to the same distribution. 

We now show that independent subsets of variables allow the question of deter- 
mining if a distribution arises from parameters on a tree to be 'decomposed' into the 
same question for the marginalizations to the subsets. 

Proposition 5.3. Let P be a joint distribution of a set L of k -state variables such 
that for some partition Li\L2 \ ■ • ■ \Ls of L, the variable sets Li and Lj are independent 
for all i ^ j. Suppose the marginal distribution of each Li arises from nonsingular 
GM(k ) parameters on a tree Ti . Then P arises from GM(k ) parameters on any tree 
T which can be obtained by connecting the trees Ti,T2, . . . , by the introduction of 
new edges between them ( with endpoints possibly subdividing either edges of the Ti or 
previously introduced edges joining some of the Ti). 

Note that the converse of this statement, that if P arises from parameters for 
the GM(fc) model on an |L|-leaf tree then the marginal distributions of each Li arise 
from parameters for the GM(fc) model on an |Li|-leaf tree, is well-known, and does 
not require the independence of the variable sets, or nonsingularity of parameters. 

Proof. It is enough to consider a partition of L into two independent subsets, 
Li\L2. Let T be any tree formed by connecting Ti and T2 by a single edge, possibly 
with endpoints introduced to subdivide edges of one or both of the Ti. If e = (ri, r2) 
is the edge joining Ti and T2, with in Ti, then by Lemma 15.21 we may assume that 
parameters on Ti and T2 are given for roots ri and 7'2- We root T at ri and then 
specify parameters on T as the root distribution tti for Ti, all matrix parameters on 
the edges of Ti and T2, and for the edge e the matrix Mg = l7r2"^ where 7T2 is the 
root distribution on T2. 

Let P denote the image of these parameters under ipT- The edge e of T induces 
the split Li\L2 of the leaf variables, and flattening with respect to e gives Flate(P) = 
A^CB where A, B are k x fcl^^l and k x fc'^^l matrices depending only on the matrix 
parameters on the subtrees Ti and T2, and 

C = diag(7ri)Me, 

= diag(7ri)l7r^ = ttitt^. 

Indeed, in the stochastic case, A gives probabilities of observations at the leaves Li 
conditioned on the state at ri , i? gives probabilities of observations at the leaves L2 
conditioned on the state at r2 , and C is a matrix giving the joint distribution of states 
at ri and r2. Observing that A^CB = (A-^7ri)(7r|'i?), independence implies that P is 
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the product of the same marginal distributions on Li and L2 as P, and hence P = P. 
□ 

One can replace the assumption of nonsingularity of parameters in this proposition 
with one of stochasticity, since the main technical point in the proof is that we need 
freedom to move the root to any node of valence 2 along an edge of T. By Lemma 
I5.1[ this holds for stochastic parameters, so we obtain the following. 

Proposition 5.4. With P and the Li as in Proposition if the marginal 
distributions of each Li arise from stochastic parameters for the GM(k ) model on an 
\Li\-leaf tree, then P arises from stochastic parameters for the GM(k) model on an 
\L\-leaf tree. If the parameters on the \Li\-leaf trees are positive, so are those on the 
\L\-leaf tree. 

By this proposition, the only sets we must understand to build a scmialgcbraic 
description for the full n-leaf stochastic model arc the image of parameters for m-leaf 
trees, m < n, when no subsets of the m leaf variables arc independent. In the case 
A: = 2, by Proposition 12. 2[ this is precisely the images of nonsingular parameters. 
Unfortunately, for larger k characterizing such images is much more difficult, and we 
do not accomplish it in this paper. 

One way to think about the difficulty with fc > 2 is in terms of matrix rank. When 
k = 2, a, Markov matrix has either rank 1 or rank 2, which results in independent 
variables or nonsingular parameters, respectively. For larger k, a Markov matrix 
may have rank between the extremes of 1 and fc, which again produce independent 
variables or nonsingular parameters. These intermediate cases of singular Markov 
matrices that are not of rank 1 would each require a detailed analysis, both in the 
3-leaf tree case, and then to produce some type of decomposition for larger trees. 

Rather than pursue a detailed scmialgcbraic model description allowing all ranks 
of Markov matrices for all k, we instead choose to focus on the image of nonsingular 
parameters. These are certainly the most important in most applications. Moreover, 
any distribution arising from singular parameters will lie in the topological closure of 
this set, as singular parameters can be approximated by nonsingular ones. Of course 
the closure may also contain points that do not arise from any parameters, so this does 
not circumvent the difficulties of dealing with the many special cases of intermediate 
rank if an exact scmialgcbraic description of the full stochastic model is desired. 

In the setting of nonsingular, though not necessarily stochastic parameters, we 
obtain the following. 

Proposition 5.5. Let P be an n-dimensional k x k x ■ ■ ■ x k distribution with 
n > 3. Then P arises from nonsingular complex parameters on a binary tree T if, 
and only if, 

(i) All marginalizations of P to 3 variables arise from nonsingular parameters 
on the induced 3-leaf, 3-edge trees, and 

ill) For all internal edges e of T , all (fc + 1) x (fc + 1) minors of the matrix 
flattening Flatf.{P) are 0. 

Moreover, such nonsingular parameters are unique up to label swapping at internal 
nodes of T . 

Note that condition (i) can be stated in terms of explicit semialgebraic conditions, 
using Corollarv l3.4l Also, the polynomial equalities of condition (ii) are usually called 
edge invariants [5]. 

Proof. For the forward implication, condition (i) follows since marginalizations 
arise from the model on the associated induced subtree, using Markov matrices that 
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are products of the original ones. Item (ii) is from [B], where it is shown that all 
P G ImijjjT) satisfy the edge invariants. (The nonsingularity of parameters is not 
required for either of these.) 

For the reverse implication, we proceed by induction on the size n of the variable 
set L. The claim holds by assumption in the base case of n = 3. Assume the statement 
is true for fewer than n > 4 variables. We identify leaves of T with the variables 
associated to them. Choose some internal edge eo = (a, b) of T, corresponding to 
the split Li\L2 of L, with \Li\, IL2I > 2, a in the subtree spanned by Li, and b in 
the subtree spanned by ^2- Introducing a vertex c subdividing (a, 6), let Ti be the 
subtree with leaves Li U {c} and T2 the subtree with leaves L2 U {c}. Thus (a, c) in 
Ti and (6, c) in T2 are the edges formed from dividing (a, b). 

Since the edge invariants are satisfied by P, Flate^ (P) has rank at most k. There- 
fore, there exist A:''^^! x k and k x fc'^^l matrices A, B, both of rank at most fc, with 

Flateo(P) = ^B. 

Choose a single variable £2 G L2 and let Q denote the marginalization of P to LiU{^2}- 
Then there is a /c''^^! x fc matrix N such that 

Flateo(P)7V = ABN = Flate,(Q), 

where this last flattening is along the edge ei = (a, £2) in the induced subtree on 
Li U {£2}- Stated differently, multiplication by N marginalizes over all those leaves 
in L2 except £2- 

Since Q also satisfies conditions (i) and (ii), by the inductive hypothesis Q arises 
from nonsingular parameters. Moreover, we see that Flatej(Q) has rank k, since 
marginalization over all but one variable in Li is seen to produce a rank k matrix 
from the nonsingular parameterization. It follows that the k x k matrix BN has rank 
k. Replacing A and B with AC and C~^B for some invertible k x k matrix C, we 
may further assume the rows of BN add to 1. 

Now since Q arises from nonsingular parameters on a (|Li |-|-l)-leaf tree isomorphic 
to Ti rooted at a, we claim that Q' ~ Q {BN)~^ arises from nonsingular complex 
parameters on Ti for some suitable choice of B. Indeed, Q' arises from the same 
parameters as Q, except that on the edge (a, c) we use the matrix parameter that 
is the product of the one on the edge leading to £2 and (BN)^^. Since {BN)~^ is 
a nonsingular matrix with rows summing to one, the only condition to check is that 
the marginalization of the resulting distribution to c has no zero entries. But this 
marginalization is Vc = vg^iBN)^^ , and has a zero entry only if v^^ is in the left 
nullspace of one (or more) of the columns of (BN)^ . However, replacing A and B 
with AC and C~^B for some appropriate nonsingular matrix C whose rows sum to 
one, we can ensure that v^, has no zero entries. 

Since the parameters producing Q' arc nonsingular, by Lemma l5.2l we may reroot 
Ti at c, with parameters the root distribution Vc, matrices {Me} on all edges of Ti 
corresponding to ones in T , and matrix M(^^ g^ on the edge (c, a). 

Now with K the matrix which marginalizes Flate^ (P) over all elements of Li but 
one, say £1, we see 

ii'Flate„(P) = KAB = Flats, (JJ), 

where U is the marginalization of P over the same elements of Li and the last flat- 
tening is on 62 — (b,£i) in the induced subtree, which is isomorphic to r2- But by 
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induction U arises from nonsingular parameters on T2 rooted at b. Letting M be the 
product of the matrix parameters on the edges in the path from c to £1 in Ti. Then 
U' = U M^^ also arises from nonsingular parameters on T2 (checking that its 
marginalization to c is v^^A/^^ Vc, which has no zeros by construction). 

Note now that U' has flattening [Ar^YKAB. But {M^'^Y'KA = diag(vc) by 
construction. Thus diag(vc)i? is the c\L2 flattening of a tensor arising on T2 from 
nonsingular complex parameters. With the root at c, let Me be the Markov parameters 
for all edges of T2 corresponding to ones in T, and M(^^ i,j the Markov matrix on (c, 6). 
The root distribution Vc is the same as for Ti. 

It remains to check that P is the image of the parameters on T with subdivided 
edges (c, a) and (c, 6) rooted at c given by v^, {Me}e5:^(a.b)> and Mj^.a) and M(^^ i,y 
But these parameters lead to the distributions Q' and U' on Ti and T2 respectively. 
Since Flat(a_c)(Q') = ^ while Flat(c_f,)(?7') = diag(Vc)i?, the equation Flate(P) = AB 
shows they produce P on T. 

That the parameters are unique, up to label swapping at the internal nodes of T, 
follows from the 3-leaf case. □ 

Note that in establishing the reverse implication in Proposition 15.51 we did not 
use condition (i) for every 3- variable marginalization. Informally, given a tree T one 
could choose a sequence of edges which can be successively 'cut' (by the introduction 
of the node c in the inductive proof above) to produce a forest of 3-taxon trees. Then 
condition (i) is only needed for a subset of the 3-leaf marginalizations, determined by 
the sequence of edges chosen to cut and the choice of the variables denoted £1 , £2 in 
the proof. Similarly, not all edge flattenings of condition (ii) are used: For the first 
edge to be 'cut', one uses the full edge flattening, but after that, only edge flattenings 
of marginalizations to subsets of variables are needed. Thus the full set of conditions 
given in this proposition is actually equivalent to a subset of them, though pinning 
down a precise subset is rather messy and will not be pursued here. 

Supposing now that an 71-dimensional distribution P arises from nonsingular com- 
plex parameters on a binary tree T, we wish to give semialgcbraic conditions that are 
satisfied if, and only if, the parameters are stochastic. By considering only marginal- 
izations to 3 variables and appealing to Proposition 13.61 we can give conditions that 
hold precisely when the root distribution and products of matrix parameters along 
any path leading from an interior vertex of T to leaves are stochastic. This immedi- 
ately yields semialgcbraic conditions that the root distribution and matrix parameters 
on terminal edges are stochastic. However, additional criteria are needed to ensure 
matrix parameters on interior edges are stochastic. In the 4-leaf case, such criteria 
are given by the following. 

Proposition 5.6. Suppose a tensor P arises from nonsingular complex parame- 
ters for GM(k) on the A-leaf tree 12\34:. If the i-marginalizations P...ji- and P-^.... arise 
from stochastic parameters and, in addition, all principal minors of the k"^ x matrix 

det(P+..+)det(P.+.+)Flati3|24 {P *2 (adj(Pj..+)P^.+)) *3 (adj(P+.+)P++.)) (5.1) 

are nonnegative, then P arises from stochastic parameters. 

The statement about the minors of the symmetric matrix in (|5.ip is of course 
really a requirement that this matrix be positive semidefinite. Also, this matrix could 
instead be replaced by ones where the roles of leaves 1 and 2 or of leaves 3 and 4 have 
been interchanged. 

Proof. Root T at the interior node near leaves 1 and 2. Let Mi, i = 1,2,3,4 be 
the complex matrix parameter with row sums equal to one on the edge leading to leaf 
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i, Afs the matrix parameter on the internal edge, and tt the root distribution. Define 
the matrices 

A^32 = Pl.+ = A'/3^Mjdiag(7r)Af2, 
N31 = P.^.+ = A/3^Af5^diag(7r)Mi. 

Then 

P = P *2 A^32'^31 

is a tensor arising from the same parameters as P except that M2 has been replaced 
with Ml. That is, now the same matrix parameter is used on the edges leading to 
leaves 1 and 2. 
Similarly with 

A^i4 = P++. = A/f diag(7r)Af5A/4, 
A^i3 = P-+-+ = A/f diag(7r)A/5A^3, 

then 

P = P*3N^3'Nii (5.2) 

is a tensor arising from the same parameters as P except that A/3 has been replaced 
with Mi. 

Consider now the 13|24 flattening of P, a flattening which is not consistent with 
the topology of the underlying tree. As shown in [5], this can be expressed as a 
product of k X k matrices 

Flati3|24(P) = A^DA, (5.3) 

where D is the diagonal matrix with the fc^ entries of diag(7r)A/5 on its diagonal, 
and A = Mi (8) Af4 is the Kronecker product. Because Afi,Af4 are nonsingular, so 
is A. Since conditions on 3-marginals ensure tt has positive entries, we can ensure 
M5 has nonnegative entries by requiring that Flati3|24(P) be positive semidefinite. 
Using Sylvester's theorem, this is equivalent to requiring that its principal minors 
be nonnegative. Since the resulting inequalities would involve rational expressions, 
due to the inverses of matrices, we first multiply Flat 13 1 24 P by squares of nonzero 
determinants, to remove denominators. □ 

The matrix in (|5.ip has entries of degree Ak + 1 in those of P. After removing 
squares of powers of determinants for even minors, the polynomial inequalities from 
the principal j x j minors are of degrees j{2k + 1) + 2kej, where ej e {0, 1} gives the 
parity of j . 

Together with Theorems 13.61 and 13 . 7[ the last two propositions yield the following 
theorem. 

Theorem 5.7. Suppose P is an n- dimensional joint probability distribution for 
the k-state variables Yi, . . . ,y„. Then P arises from nonsingular stochastic parame- 
ters for GM(k ) on an n-leaf binary tree T if, and only if, 

(i) All marginalizations of P to 3 variables satisfy the conditions of Theorem 
\3.6] (or if k ^ 2 of Theorem \3. 7p to arise from nonsingular stochastic parameters on 
a 3 -leaf tree; 
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(ii) For all internal edges e ofT, the edge invariants are satisfied by P, i.e., all 
(fc + 1) X (fc + 1) minors of the matrix flattening Flat^{P) are 0; 

(iii) For each internal edge e ofT, and some choice of A leaves inducing a quartet 
tree with internal edge e, all principal minors of the matrix flattening constructed in 
Provosition \5.6[ for the A- dimensional marginalization, are nonnegative. 

While we noted that one can use a smaher set of inequahties than were given in 
Proposition 15 . 51 to ensure a distribution arises from nonsingular parameters, the full 
set of inequalities given in Theorem 15 . 71 has additional redundancies. To illustrate, in 
the 4-leaf case checking that only two of the 3-marginals, say P+... and P...+ for the 
tree 12|34, satisfy the conditions of Proposition 13.61 is sufficient. 

For a 4- variable distribution P, it is straightforward to obtain semialgebraic con- 
ditions ensuring P arises from strictly positive parameters: One need only require 
the more stringent condition (iv') of Proposition 13.61 on the marginalizations P...+ 
and P-I-... to ensure they arise from strictly positive parameters, and that all leading 
principal minors of the matrix in (|5.1|) are strictly positive. This then allows one to 
give such conditions applicable to larger trees, establishing the following. 

Theorem 5.8. Semialgebraic conditions that a probability distribution P arises 
from nonsingular positive parameters for GM(k) on a tree T can be explicitly given. 

Note that one can also handle non-binary trees by the techniques of this section. 
To show a distribution arises from nonsingular, or stochastic nonsingular, parameters 
on a non-binary tree, one need only show it arises from parameters on a binary reso- 
lution of the tree, and that the Markov matrix on each edge introduced to obtain the 
resolution is the identity. But semialgebraic conditions that the Markov matrix on an 
internal edge of a 4-leaf tree be / (or a permutation, since label swapping prevents us 
from distinguishing these) amounts to requiring that the matrix of equation (|5.3p has 
rank k. Indeed, rank k implies that the Markov matrix on the internal edge has only 
k nonzero entries, and since other conditions we have derived imply nonsingularity, 
the matrix must be a permutation. 

We now give an example illustrating that the quadratic form approach of Propo- 
sition 15.61 and thus of Theorem 15.71 detects a probability distribution that is in the 
image of V't for nonsingular real GM(2) parameters on the 4-taxon tree, where each 
matrix parameter on a terminal edge is stochastic but the one on the internal edge is 
not. By choosing parameters with some care, we can arrange that such a probability 
distribution P satisfies that all 3-marginalizations arise from stochastic parameters, 
yet P does not. Such examples are not new (see for example [H [23l |37]), but we 
include one here to illustrate our methods. 

To create such an example, set the Markov parameter on each terminal edge 
to have positive entries, using, for instance, the same M on each of these 4 edges. 
Then choose the matrix parameter N on the internal edge of the tree to have very 
small negative off-diagonal entries, so small so that both MN and NM are Markov 
matrices. The root distribution may be taken to be any probability distribution with 
positive entries. An example of such an (exact) probability distribution is given by P 
with slices 
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P-21 = 



0.4005062 0.0565718 

0.0565718 0.0545702 

0.0457358 0.0141662 

0.0141662 0.0379118 



P-12 = 
P-22 = 



0.0457358 0.0141662 

0.0141662 0.0379118 

0.0100222 0.0330958 

0.0330958 0.1316062 
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Here P satisfies all conditions of Theorem 15 . 71 except (iii). A computation shows that 
the principal minors of the matrix in (|5.ip are, when rounded to eight decimal places, 
0.00363408, 0.00001744, 0.00000060, and -0.00000005. The negativity of one of these 
shows P docs not arise from stochastic parameters. 

We conclude with a complete semialgebraic description of the 2-state general 
Markov model on a 4-leaf tree without a restriction to nonsingular stochastic param- 
eters. This is straightforward to give, since Proposition 12 . 21 indicates that in this case 
a distribution which arises from parameters cither has independent leaf sets (so we 
can decompose the tree using Proposition 15. 3p , or the parameters were nonsingular 
so Theorem 15.71 applies. 

As observed earlier, the existence of many non-independence cases when k > 2 
prevents us from assembling as complete a result. 

Proposition 5.9. For the A-leaftree 12|34, the image of the stochastic parameter 
space under the general Markov model GM{2) is the union of the following sets of 
nonnegative tensors whose entries add to 1: 

1. Probability distributions of 4 independent variables: P such that all 2 x 2 
minors of every edge flattening vanish (i.e., all edge flattenings have rank 1); 

2. Probability distributions with partition into minimal independent sets of vari- 
ables of size \, 3, of which there are 4 cases: If the partition is {{Yi\,{Y2,Y^,Y^}}, 
then P such that all 2 x 2 minors o/ Flat 1234 (P) vanish, and Pj^... satisfies the con- 
ditions of Theorem \3.7\ Case 1; 

3. Probability distributions with partition into minimal independent sets of vari- 
ables of size \, 1, 2, of which there are 6 cases: If the partition is {Yi}|{y2}|{^3i ^4}; 
then P such that all 2 x 2 minors of Flati|234(J') and Flat2|i34(P) vanish, and 
det(P4_+..) is nonzero; 

4. Probability distributions with partition into minimal independent sets of vari- 
ables {{Fi, I2}, {^3, ^4}} of size 2, 2: P such that all 2 x 2 minors 0/ Flati2|34(i-') 
vanish, yet det(P..-|--|-) and det(P-f+..) are nonzero, 

5. Probability distributions with no independent sets of variables: P such that 
the edge invariants for 12|34 are satisfied, the 3-d marginalizations and P...+ 
satisfy the conditions of Theorem \3.7\ Case 1, and all principal minors of the matrix 
constructed in Provosition \5. 61 are nonnegative. 

In case 1, the only edge flattenings that are needed are those associated to terminal 
edges. If these all have rank 1, then the flattening for the internal edge does as well. 

In cases 1,2,3, the distributions arise from stochastic parameters on all 3 of the 
binary topological trees with 4 leaves, as well as the star tree. 

Note that all possible partitions of variables do not appear, but only those consis- 
tent with the tree topology. In the 4-leaf case, this has ruled out only the 2 partitions 
of size 2,2 that do not reflect a split in the tree. 

Of course one could extend the above proposition to arbitrary size trees, as long 
as fc = 2, but the number of possible partitions into independent sets of variables 
grows quickly, so we will not give an explicit statement. 
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